Search CORE

24 research outputs found

Population Structure and Genetic Diversity in a Rice Core Collection (Oryza sativa L.) Investigated with SSR Markers

Author: Jinquan Li
Nicholas James Provart
Peng Zhang
Xiangdong Liu
Xiaoling Li
Xingjuan Zhao
Yonggen Lu
Publication venue: Public Library of Science
Publication date: 02/12/2011
Field of study

The assessment of genetic diversity and population structure of a core collection would benefit to make use of these germplasm as well as applying them in association mapping. The objective of this study were to (1) examine the population structure of a rice core collection; (2) investigate the genetic diversity within and among subgroups of the rice core collection; (3) identify the extent of linkage disequilibrium (LD) of the rice core collection. A rice core collection consisting of 150 varieties which was established from 2260 varieties of Ting's collection of rice germplasm were genotyped with 274 SSR markers and used in this study. Two distinct subgroups (i.e. SG 1 and SG 2) were detected within the entire population by different statistical methods, which is in accordance with the differentiation of indica and japonica rice. MCLUST analysis might be an alternative method to STRUCTURE for population structure analysis. A percentage of 26% of the total markers could detect the population structure as the whole SSR marker set did with similar precision. Gene diversity and MRD between the two subspecies varied considerably across the genome, which might be used to identify candidate genes for the traits under domestication and artificial selection of indica and japonica rice. The percentage of SSR loci pairs in significant (P<0.05) LD is 46.8% in the entire population and the ratio of linked to unlinked loci pairs in LD is 1.06. Across the entire population as well as the subgroups and sub-subgroups, LD decays with genetic distance, indicating that linkage is one main cause of LD. The results of this study would provide valuable information for association mapping using the rice core collection in future

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

iCanPlot: Visual Exploration of High-Throughput Omics Data Using Interactive Canvas Plotting

Author: A Bateman
A Subramanian
Amit U. Sinha
AU Sinha
AU Sinha
D Blankenberg
KM Bernt
M Ashburner
M Bostock
M Kanehisa
M Reich
Nicholas James Provart
Scott A. Armstrong
Publication venue: Public Library of Science
Publication date: 29/02/2012
Field of study

Increasing use of high throughput genomic scale assays requires effective visualization and analysis techniques to facilitate data interpretation. Moreover, existing tools often require programming skills, which discourages bench scientists from examining their own data. We have created iCanPlot, a compelling platform for visual data exploration based on the latest technologies. Using the recently adopted HTML5 Canvas element, we have developed a highly interactive tool to visualize tabular data and identify interesting patterns in an intuitive fashion without the need of any specialized computing skills. A module for geneset overlap analysis has been implemented on the Google App Engine platform: when the user selects a region of interest in the plot, the genes in the region are analyzed on the fly. The visualization and analysis are amalgamated for a seamless experience. Further, users can easily upload their data for analysis—which also makes it simple to share the analysis with collaborators. We illustrate the power of iCanPlot by showing an example of how it can be used to interpret histone modifications in the context of gene expression

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Intuitive Visualization and Analysis of Multi-Omics Data and Application to Escherichia coli Carbon Metabolism

Author: A Funahashi
A Liebers
Brice Enjalbert
C Médigue
D Noble
EW Sayers
F Frankel
F Le Fèvre
Fabien Jourdan
H Neuweger
I Herman
I Schomburg
IM Keseler
Jean-Charles Portais
K Wegner
KF Aoki
N Gehlenborg
N Gehlenborg
N Ishii
N Le Novère
Nicholas James Provart
P Shannon
PT Shannon
R Bourqui
S Gama-Castro
SM Paley
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Combinations of ‘omics’ investigations (i.e, transcriptomic, proteomic, metabolomic and/or fluxomic) are increasingly applied to get comprehensive understanding of biological systems. Because the latter are organized as complex networks of molecular and functional interactions, the intuitive interpretation of multi-omics datasets is difficult. Here we describe a simple strategy to visualize and analyze multi-omics data. Graphical representations of complex biological networks can be generated using Cytoscape where all molecular and functional components could be explicitly represented using a set of dedicated symbols. This representation can be used i) to compile all biologically-relevant information regarding the network through web link association, and ii) to map the network components with multi-omics data. A Cytoscape plugin was developed to increase the possibilities of both multi-omic data representation and interpretation. This plugin allowed different adjustable colour scales to be applied to the various omics data and performed the automatic extraction and visualization of the most significant changes in the datasets. For illustration purpose, the approach was applied to the central carbon metabolism of Escherichia coli. The obtained network contained 774 components and 1232 interactions, highlighting the complexity of bacterial multi-level regulations. The structured representation of this network represents a valuable resource for systemic studies of E. coli, as illustrated from the application to multi-omics data. Some current issues in network representation are discussed on the basis of this work

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

HAL-INSA Toulouse

ProdInra

Identifying Cis-Regulatory Sequences by Word Profile Similarity

Author: A Ivan
A Nasiadka
A Sosinsky
AG Nazina
AP Lifanov
BP Berman
BP Berman
BY Chan
C Zhang
D Bachtrog
DL Halligan
DS Johnson
E Emberly
EA Glazov
EE Hare
EH Davidson
F Poulin
Garmay Leung
H Janssens
I Abnizova
L Li
M Klingler
Michael B. Eisen
MR Kantorovitz
MS Halfon
N Pierstorff
N Rajewsky
Nicholas James Provart
S Prabhakar
S Sinha
XY Li
YH Grad
Publication venue: Public Library of Science
Publication date: 01/09/2009
Field of study

Recognizing regulatory sequences in genomes is a continuing challenge, despite a wealth of available genomic data and a growing number of experimentally validated examples.We discuss here a simple approach to search for regulatory sequences based on the compositional similarity of genomic regions and known cis-regulatory sequences. This method, which is not limited to searching for predefined motifs, recovers sequences known to be under similar regulatory control. The words shared by the recovered sequences often correspond to known binding sites. Furthermore, we show that although local word profile clustering is predictive for the regulatory sequences involved in blastoderm segmentation, local dissimilarity is a more universal feature of known regulatory sequences in Drosophila.Our method leverages sequence motifs within a known regulatory sequence to identify co-regulated sequences without explicitly defining binding sites. We also show that regulatory sequences can be distinguished from surrounding sequences by local sequence dissimilarity, a novel feature in identifying regulatory sequences across a genome. Source code for WPH-finder is available for download at http://rana.lbl.gov/downloads/wph.tar.gz

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Large-Scale Identification of Mirtrons in Arabidopsis and Rice

Author: C Johnson
Chaogang Shao
E Berezikov
E Berezikov
E Huala
EA Glazov
JE Babiarz
JG Ruby
JO Westholm
JW Brown
K Miyoshi
K Okamura
M Nakano
MA German
MA German
MJ Axtell
MW Jones-Rhoades
Nicholas James Provart
O Voinnet
P Steffen
Q Yuan
QH Zhu
RW Carthew
S Griffiths-Jones
SF Altschul
T Barrett
VN Kim
WJ Chung
X Chen
X Dai
Y Meng
Y Zhang
Yijun Meng
Publication venue: Public Library of Science
Publication date: 13/02/2012
Field of study

A new catalog of microRNA (miRNA) species called mirtrons has been discovered in animals recently, which originate from spliced introns of the gene transcripts. However, only one putative mirtron, osa-MIR1429, has been identified in rice (Oryza sativa). We employed a high-throughput sequencing (HTS) data- and structure-based approach to do a genome-wide search for the mirtron candidate in both Arabidopsis (Arabidopsis thaliana) and rice. Five and eighteen candidates were discovered in the two plants respectively. To investigate their biological roles, the targets of these mirtrons were predicted and validated based on degradome sequencing data. The result indicates that the mirtrons could guide target cleavages to exert their regulatory roles post-transcriptionally, which needs further experimental validation

Public Library of Science (PLOS)

Crossref

PubMed Central

FigShare

Human Gene Coexpression Landscape: Confident Network Derived from Tissue Transcriptomic Profiles

Author: AJ Enright
Alberto Risueño
B Poree
BM Bolstad
C Magee
C Prieto
Carlos Prieto
Celia Fontanillo
D Panne
E Eisenberg
F Murtagh
GD Bader
HK Lee
I Tirosh
Javier De Las Rivas
JK Choi
JM Stuart
JN Suojanen
JV Falvo
KF Aoki-Kinoshita
LL Hsiao
LW Chang
M Barnes
M Kypriotou
Nicholas James Provart
OL Griffith
P Shannon
PB Dallas
PM Magwene
R Suzuki
RA Irizarry
S Brohee
S Calza
TW Loong
V van Noort
WK Lim
WM Liu
Y Wang
Publication venue: Public Library of Science
Publication date: 19/11/2012
Field of study

This is an open-access article distributed under the terms of the Creative Commons Attribution License.[Background]: Analysis of gene expression data using genome-wide microarrays is a technique often used in genomic studies to find coexpression patterns and locate groups of co-transcribed genes. However, most studies done at global >omic> scale are not focused on human samples and when they correspond to human very often include heterogeneous datasets, mixing normal with disease-altered samples. Moreover, the technical noise present in genome-wide expression microarrays is another well reported problem that many times is not addressed with robust statistical methods, and the estimation of errors in the data is not provided. [Methodology/Principal Findings]: Human genome-wide expression data from a controlled set of normal-healthy tissues is used to build a confident human gene coexpression network avoiding both pathological and technical noise. To achieve this we describe a new method that combines several statistical and computational strategies: robust normalization and expression signal calculation; correlation coefficients obtained by parametric and non-parametric methods; random cross-validations; and estimation of the statistical accuracy and coverage of the data. All these methods provide a series of coexpression datasets where the level of error is measured and can be tuned. To define the errors, the rates of true positives are calculated by assignment to biological pathways. The results provide a confident human gene coexpression network that includes 3327 gene-nodes and 15841 coexpression-links and a comparative analysis shows good improvement over previously published datasets. Further functional analysis of a subset core network, validated by two independent methods, shows coherent biological modules that share common transcription factors. The network reveals a map of coexpression clusters organized in well defined functional constellations. Two major regions in this network correspond to genes involved in nuclear and mitochondrial metabolism and investigations on their functional assignment indicate that more than 60% are house-keeping and essential genes. The network displays new non-described gene associations and it allows the placement in a functional context of some unknown non-assigned genes based on their interactions with known gene families. [Conclusions/Significance]: The identification of stable and reliable human gene to gene coexpression networks is essential to unravel the interactions and functional correlations between human genes at an omic scale. This work contributes to this aim, and we are making available for the scientific community the validated human gene coexpression networks obtained, to allow further analyses on the network or on some specific gene associations. The data are available free online at http://bioinfow.dep.usal.es/coexpression/. © 2008 Prieto et al.Funding and grant support was provided by the Ministery of Health, Spanish Government (ISCiii-FIS, MSyC; Project reference PI061153) and by the Ministery of Education, Castilla-Leon Local Government (JCyL; Project reference CSI03A06).Peer Reviewe

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Digital.CSIC

The next generation of training for arabidopsis researchers: Bioinformatics and Quantitative Biology

Author: A Gehan Malia
Alexander Buksch
Bastow Ruth
C Meyers Blake
Chris Town
Cranos Williams
Crispin Taylor
Doreen Ware
Edgar Spalding
Erich Grotewold
Gabriel Krouk
J Doherty Colleen
J Provart Nicholas
Jim Beynon
Joanna Friesner
Julia Bailey-Serres
L Eveland Andrea
M Brady Siobhan
Matthew Vaughn
Michael Gonzales
Molly Megraw
Murray James
Olivia Wilkins
Pascal Falter-Braun
R Dinneny Jose
Richard Vierstra
RJ Cody Markelz
Robin Buell
Rodrigo Gutierrez
Roger Smith
Sarah Assmann
Shisong Ma
Sue Rhee
Taku Demura
Tracy Teal
U Torii Keiko
Ute Kramer
Volker Brendel
Wolfgang Busch
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 01/12/2017
Field of study

It has been more than 50 years since Arabidopsis (Arabidopsis thaliana) was first introduced as a model organism to understand basic processes in plant biology. A well-organized scientific community has used this small reference plant species to make numerous fundamental plant biology discoveries (Provart et al., 2016). Due to an extremely well-annotated genome and advances in high-throughput sequencing, our understanding of this organism and other plant species has become even more intricate and complex. Computational resources, including CyVerse,3 Araport,4 The Arabidopsis Information Resource (TAIR),5 and BAR,6 have further facilitated novel findings with just the click of a mouse. As we move toward understanding biological systems, Arabidopsis researchers will need to use more quantitative and computational approaches to extract novel biological findings from these data. Here, we discuss guidelines, skill sets, and core competencies that should be considered when developing curricula or training undergraduate or graduate students, postdoctoral researchers, and faculty. A selected case study provides more specificity as to the concrete issues plant biologists face and how best to address such challenges

Online Research @ Cardiff

Exploring the Switchgrass Transcriptome Using Second-Generation Sequencing Technology

Background: Switchgrass (Panicum virgatum L.) is a C4 perennial grass and widely popular as an important bioenergy crop. To accelerate the pace of developing high yielding switchgrass cultivars adapted to diverse environmental niches, the generation of genomic resources for this plant is necessary. The large genome size and polyploid nature of switchgrass makes whole genome sequencing a daunting task even with current technologies. Exploring the transcriptional landscape using next generation sequencing technologies provides a viable alternative to whole genome sequencing in switchgrass. Principal Findings: Switchgrass cDNA libraries from germinating seedlings, emerging tillers, flowers, and dormant seeds were sequenced using Roche 454 GS-FLX Titanium technology, generating 980,000 reads with an average read length of 367 bp. De novo assembly generated 243,600 contigs with an average length of 535 bp. Using the foxtail millet genome as a reference greatly improved the assembly and annotation of switchgrass ESTs. Comparative analysis of the 454-derived switchgrass EST reads with other sequenced monocots including Brachypodium, sorghum, rice and maize indicated a 70– 80 % overlap. RPKM analysis demonstrated unique transcriptional signatures of the four tissues analyzed in this study. More than 24,000 ESTs were identified in the dormant seed library. In silico analysis indicated that there are more than 2000 EST-SSRs in this collection. Expression of several orphan ESTs was confirmed by RT-PCR. Significance: We estimate that about 90 % of the switchgrass gene space has been covered in this analysis. This study nearl

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Systems Biology of the Clock in Neurospora crassa

Author: A Correa
A Correa
A Pierani
A Wagner
AC Froehlich
AC Froehlich
AJ Saldanha
AM Pregueiro
AP Gerber
B Emery
B Lakowski
BD Aronson
BH Miller
CA Jones
D Battogtokh
D Fambrough
D Violin
DA Logan
DC Clarke
EH Davidson
G Mannhaupt
H Nakashima
H.-Bernd Schüttler
HR Ueda
HV Colot
J Arnold
J Arnold
J Dai
J Smith
J Tegner
James Griffith
JC Dunlap
JC Dunlap
JD Storey
JE Galagan
JJ Loros
Jonathan Arnold
JP Townsend
JWC Locke
K Kaldi
K Lee
KS Brown
L Ma
L Youssar
LA Sawyer
M Davidian
M Görl
M Inada
M Nowrousian
M Xiong
MB Eisen
MB Elowitz
MF Covington
MJ De Hoon
MJ McDonald
MK Yeung
MW Vitalini
NH Giles
Nicholas James Provart
NP D'mello
NY Garceau
P Bloomfield
P Cheng
P Cheng
P Ruoff
Q He
R Alves
RA Gutierrez
RA Gutierrez
RH Davis
RJ Muirhead
RM De Paula
Roger Nilsen
Rosemary Kim
S Dharmananda
S Masloff
SH Strogatz
SJ Collis
SK Crosthwaite
SK Crosthwaite
SL Harmer
SM Jazwinski
T Ideker
T Kasuga
T Schafmeier
TP Michael
TP Michael
TP Michael
TR Hughes
TS Gardner
W Dong
W Heisenberg
WJ Belden
WJ Belden
Wubei Dong
Xiaojia Tang
Y Benjamini
Y Lin
Y Liu
Y Yang
Y Yu
Y Yu
Yihai Yu
ZA Lewis
Publication venue: Public Library of Science
Publication date: 29/08/2008
Field of study

A model-driven discovery process, Computing Life, is used to identify an ensemble of genetic networks that describe the biological clock. A clock mechanism involving the genes white-collar-1 and white-collar-2 (wc-1 and wc-2) that encode a transcriptional activator (as well as a blue-light receptor) and an oscillator frequency (frq) that encodes a cyclin that deactivates the activator is used to guide this discovery process through three cycles of microarray experiments. Central to this discovery process is a new methodology for the rational design of a Maximally Informative Next Experiment (MINE), based on the genetic network ensemble. In each experimentation cycle, the MINE approach is used to select the most informative new experiment in order to mine for clock-controlled genes, the outputs of the clock. As much as 25% of the N. crassa transcriptome appears to be under clock-control. Clock outputs include genes with products in DNA metabolism, ribosome biogenesis in RNA metabolism, cell cycle, protein metabolism, transport, carbon metabolism, isoprenoid (including carotenoid) biosynthesis, development, and varied signaling processes. Genes under the transcription factor complex WCC ( = WC-1/WC-2) control were resolved into four classes, circadian only (612 genes), light-responsive only (396), both circadian and light-responsive (328), and neither circadian nor light-responsive (987). In each of three cycles of microarray experiments data support that wc-1 and wc-2 are auto-regulated by WCC. Among 11,000 N. crassa genes a total of 295 genes, including a large fraction of phosphatases/kinases, appear to be under the immediate control of the FRQ oscillator as validated by 4 independent microarray experiments. Ribosomal RNA processing and assembly rather than its transcription appears to be under clock control, suggesting a new mechanism for the post-transcriptional control of clock-controlled genes

Public Library of Science (PLOS)

Crossref

PubMed Central

A Genome-Wide Gene Function Prediction Resource for Drosophila melanogaster

Author: A Liaw
A Statnikov
A Vazquez
AC Edwards
AC Gavin
AJ Walhout
AM Johansson
B Estrada
C Stark
CJ Echeverri
CL Myers
David E. Hill
DB Johnson
EM Marcotte
Frederick P. Roth
G Obozinski
GE Carney
GW Muse
H Agaisse
H Yu
Han Yan
HJ Lee
HL Liang
HN Chua
I Carrera
I Flockhart
IH Witten
J Beaver
J Jemc
J Reboul
J Wang
J Yu
JC Costello
JF Rual
JG Mezey
JG Sorensen
John E. Beaver
JZ Maines
K Venkatesan
KA Boltz
Kavitha Venkatesan
KC Gunsalus
KD Pruitt
KE Weber
L Breiman
L Giot
LC Firth
M Ashburner
M Johnson
M Kanehisa
M Tasan
M Umemori
Marc Vidal
ME Cusick
Michael E. Cusick
ML Whitfield
MN Arbeitman
Muhammed A. Yildirim
N Robine
N Simonis
NA Terry
Nicholas James Provart
Niels Klitgord
NJ Mulder
Norbert Perrimon
P Braun
P Mourikis
P Muller
P Tomancak
P Uetz
R Sharan
RB Beckstead
RJ Wilson
RL Tatusov
S Aerts
S Li
SB Kotsiantis
T Brody
T Ito
Tong Hao
V Reinke
W Tian
X Deng
X Deng
X Deng
X Qin
X Wang
X Wu
Y Ho
Publication venue: Public Library of Science
Publication date: 01/08/2010
Field of study

Predicting gene functions by integrating large-scale biological data remains a challenge for systems biology. Here we present a resource for Drosophila melanogaster gene function predictions. We trained function-specific classifiers to optimize the influence of different biological datasets for each functional category. Our model predicted GO terms and KEGG pathway memberships for Drosophila melanogaster genes with high accuracy, as affirmed by cross-validation, supporting literature evidence, and large-scale RNAi screens. The resulting resource of prioritized associations between Drosophila genes and their potential functions offers a guide for experimental investigations

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central